Journal of Health Global

Comparing verbal autopsy cause of death findings as determined by physician coding and probabilistic modelling: a public health analysis of 54 000 deaths in Africa and Asia Background Coverage of civil registration and vital statistics varies globally, with most deaths in Africa and Asia remaining either unregis-tered or registered without cause of death. One important constraint has been a lack of fit–for–purpose tools for registering deaths and assigning causes in situations where no doctor is involved. Verbal autopsy (interviewing care–givers and witnesses to deaths and interpreting their information into causes of death) is the only available solution. Automated interpretation of verbal autopsy data into cause of death information is essential for rapid, consistent and affordable processing.

"Civil registration and vital statistics don't quicken everyone' s pulse." So wrote Richard Horton [1] in summarising the first Global Summit on Civil Registration and Vital Statistics (CRVS), held in Bangkok in April 2013. But, as was clear from that meeting, global understanding of public health depends on having an adequately comprehensive overview of cause-specific mortality patterns at the population level. Counting people and their life events is a big part of what needs to be done more effectively and comprehensively [2]; added to that is the need to attribute cause to deaths in a systematic, rapid, consistent and costeffective way.
Unsatisfactory progress in CRVS over recent decades lay at the heart of the four major objectives of the WHO Commission on Information and Accountability for Women' s and Children' s Health (COIA) [3]. Accountability at every level ultimately depends on effectively counting individuals, and then making good use of those data. Implementation of COIA' s recommendations was entrusted to an independent Evidence Review Group (iERG), which, in its 2013 report [4], acknowledged that COIA' s recommendation on enhancing CRVS will be "difficult or impossible to achieve" by the target date of 2015. Instead, iERG now recommends making effective CRVS a post-2015 development target. While there are evidently many practical obstacles to achieving reliable CRVS on a global scale, one prerequisite component is the availability of fit-for-purpose tools for registering deaths and assigning cause of death. Such tools must be openly accessible, and be capable of delivering consistent and systematic mortality data in a timely and cost-effective manner.
Verbal autopsy (VA; interviewing a care-giver, relative or witness after a death, and using the interview material to determine cause of death) is seen as an essential interim approach for filling in some of the gaps in global knowledge on cause-specific mortality [5], which can otherwise only be estimated [6]. Although, in the long-term, one might hope for universal physician certification of deaths, undertaken methodically and rigorously, this will not be the case for most deaths in Africa and Asia for the foreseeable future. The immediate public health concern therefore is to establish VA methods for determining cause of death which are readily applicable on a large scale (including in routine CRVS processes) and provide sufficient detail for effective health planning.
Verbal autopsy interview material has been collected in a variety of ways, and then interpreted into cause of death data by various methods. There has therefore been substantial methodological heterogeneity involved, which can magnify existing uncertainties over cause-specific mortality. The World Health Organization (WHO) released a new standard for VA data collection together with a revised set of cause of death categories (with equivalence to the International Classification of Diseases version 10 [ICD-10]) in 2012 [7]. The process undertaken to streamline previous VA approaches into the new 2012 WHO VA standard is described in detail elsewhere [5].
Ways of interpreting VA data essentially fall into physician consideration of individual cases (physician-coded verbal autopsy, PCVA) or various mathematical approaches to automated processing of VA data. PCVA has been a de facto standard in many research settings, although associated details of methods and validity have not always been well established [8] other than in specific studies of hospitalbased deaths. PCVA is generally considered too slow and expensive for routine CRVS implementation, apart from the disadvantage of consuming often scarce physician time. A number of approaches to automated processing have been tried over the last decade or so; the currently most widely used is the InterVA suite of models that apply Bayesian probabilistic modelling, and which have been in the public domain in various versions since 2005 (at www.interva.net) [9]. Corresponding to the release of the 2012 WHO VA standard, InterVA-4 was released in 2012, incorporating exactly the same range of input and output parameters as specified by WHO [10].
Nevertheless, monitoring cause-specific mortality is a long-term process, and so much of the existing VA material which is archived in various places reflects earlier standards and variations. It will be some time yet before any substantial body of VA data originally collected according to the provisions of the 2012 WHO VA standard becomes available. Our aim in this paper is to take VA archives from a variety of pre-2012 sources, which have also been assessed by PCVA, convert them insofar as is possible into the 2012 WHO format, and compare the PCVA and Inter-VA-4 findings. Our objective is primarily methodological. Rather than attempting to illuminate specific epidemiological findings, we evaluate the consistency between applying the 2012 WHO VA standard and the corresponding Inter-VA-4 model to existing secondary data, and compare this with the primary physician-coded findings from the same data. The underlying consideration is the public health consistency and relevance of the two approaches -Inter-VA-4 and PCVA -as a source of information for health planning in regions where routine cause-specific mortality data are scarce. Many national and regional public health practitioners are posing the question as to whether they can reasonably rely on verbal autopsy surveillance with automated methods for assigning cause of death to monitor mortality patterns in the populations they serve: this study aims to answer that question.
using the conversion table specified in the WHO documentation. Age-groups corresponding to WHO 2012 categories (0-28 days, 1-11 months, 1-4 years, 5-14 years, 15-49 years, 50-64 years and 65+ years) were used as the basis for analysis. Because of inherent uncertainty at the individual level in differentiating in many cases between the 01.03 HIV/AIDS and 01.09 pulmonary TB cause categories, both for InterVA-4 and PCVA, comparisons are presented with those categories separate and combined.
CSMFs were calculated for each source and cause of death, separately for InterVA-4 and PCVA findings. Concordance between InterVA-4 and PCVA CSMFs was measured using Lin' s concordance correlation coefficient [19], corrected and implemented for Stata [20]. As an alternative metric for assessing the equivalence of CSMFs from InterVA-4 and PCVA findings, we used the Wilcoxon matched-pairs signed ranks test and its two one-sided tests (TOST) variant for stochastic equivalence, with epsilon set to 3, as implemented for Stata [21]. Ratios of CSMFs according to In-terVA-4 and PCVA, by source, age-sex group and cause, were calculated together with 99% CIs, according to the Katz adjusted log method which permits the estimation of intervals around ratios where one side is zero [22]. CIs were calculated at the 99% level as hundreds of separate ratios were assessed. The objective of calculating these CIs was not so much for the sake of demonstrating statistical significance, but rather to identify particular causes and agesex groups for which the CSMF ratios between interpretations by InterVA-4 and physicians were appreciably lower or higher than might be expected by chance, taking into account the number of cases involved.
No specific ethical clearance was required for this study, which relied solely on the analysis of existing secondary data, without individually identifiable information. For the Kenya data set, in Kisumu, following cultural customs, compound heads provide written consent for all compound members to participate in the HDSS activities. Any individual can refuse to participate at any time. The Kisumu

DATA SOURCES AND METHODS
For the purposes of this comparison, we have selected several VA data sets for secondary analyses on grounds of availability, variety of original VA procedures, coverage of diverse geographic locations and population groups, and with well-established local PCVA procedures. PCVA procedures varied slightly between sites, but for every site the consensus "main" or "underlying" cause was used here. The sources and characteristics of the data are shown in Table  1. Data were sourced from Afghanistan, Bangladesh, Ghana, Kenya and South Africa. The original sources were of two main types, Demographic and Household Surveys (DHS) [17] and INDEPTH Network Health and Demographic Surveillance Systems (HDSS) [18] but there were also local variations in the details of VA procedures used within these two groupings. The locations also cover a wide range of HIV and malaria prevalences, which are the two causes of death which vary most markedly geographically. The two sites in South Africa are only 600 km apart and share a number of characteristics, but used different VA procedures. All of the PCVA results were reported using ICD-10 codes, enabling direct comparison with the Inter-VA-4 outputs using the WHO 2012 ICD-10 cause category definitions.
Stata command files were created for each site to extract as many as possible of the 2012 WHO InterVA indicators for each case (possible indicators total 244 across all age-sex groups, with the number of applicable questions for any particular death ranging from 54 to 181) from the various VA data sets. VA records which did not contain any symptom data (ie, only identification and background indicators) or which did not include valid age and sex details were excluded. The VA data from each source were then processed using InterVA-4 (version 4.02) and the cause of death outputs processed into cause-specific mortality fractions (CSMF) as previously described [10]. PCVA outputs, specified as ICD-10 codes, were categorised into the 2012 WHO VA cause of death groups for comparative purposes,

RESULTS
Over the total of 54 182 VA records analysed, Table 2 shows concordance correlation coefficients by data source and by age-group, both for the basic outputs and with the HIV and TB categories combined for sub-Saharan Africa. Figure 1 shows, for each WHO 2012 cause category and over all the six sources, a scatter plot of CSMFs from both InterVA-4 and PCVA interpretations. The corresponding concordance correlation coefficient was 0.831 (95% CI 0.751-0.911), and this increased to 0.974 (95% CI 0.961-0.987) when the 01.03 HIV/AIDS and 01.09 pulmonary TB cause categories were combined for sub-Saharan Africa. Table 3 shows results from the alternative Wilcoxon' s metric for equivalence between CSMFs. Equivalence is represented by the large p values for the standard Wilcoxon' s signed rank test (not permitting rejection of the null hy-pothesis of no difference) together with significant p values indicating that differences lay within the equivalence range.
Graphical presentations for each source separately, in a similar format to Figure 1, are available in Online Supplementary Document, which also show WHO 2012 cause categories. Table 4 shows the CSMF for each WHO 2012 cause category and site, as determined by InterVA-4 and PCVA.
Using the CSMFs shown in Table 3 for each cause and source, CSMF ratios InterVA-4:PCVA were calculated with 99% confidence intervals as a basis for comparison. These are tabulated fully in Additional File 1. Of the 320 source/ cause comparisons that were made, 171 (53.4%) of these Byass et al.  ratios were not significantly different from unity at the 99% level.
CSMFs were similarly calculated by age-group and sex, across all sources. These results, in a similar format to

DISCUSSION
Our results show a generally good level of agreement between the InterVA-4 and PCVA approaches to the interpretation of this large VA data set, over diverse populations. There are some important differences, discussed below, but nevertheless the two approaches achieved good public health equivalence, meaning that taking public health and health planning measures on the basis of either source would lead to similar conclusions. This concept of "public health equivalence" is very important in interpreting these findings. Development of VA methods in recent years has led to a situation in which public health practitioners in countries where deaths are not routinely registered with causes are posing important practical questions. They need to know whether they can reasonably rely on modern VA methods with automated interpretation to provide policy-relevant information on mortality patterns in a cost-effective manner. This is not just a matter of identifying major causes of death -it is equally critical, for example, to monitor causes that have become rare, such as measles, in order to be sure of the continued effectiveness of vaccination programmes. Previous work [23,24] has shown that InterVA-4 can be effectively operationalised at much lower cost than PCVA; here we demonstrate its functional equivalence to PCVA.
It is critical to realise that neither InterVA-4 nor PCVA, nor indeed the underlying VA data to which they have been applied, necessarily represent absolute truth (whatever that may be) in terms of cause of death. Cause of death assignment is, at best, a mixture of science and judgement [25].
There is an extensive literature on comparisons between different methods for determining cause of death, which show substantial inter-method variations. A review of clinical cause of death assignment and post-mortem findings found rates of discrepancies ranging from 30% to 63% across the 18 included studies [26]. Pre-mortem CT imaging has been evaluated as only able to correctly identify 66% of post-mortem examination causes of death [27]. In South Africa, an autopsy series on miners found that 51% of respiratory infections diagnosed at autopsy had not been noted clinically [28]. There is a clear need to improve future VA methods by validating causes of death directly against post-mortem findings, but that is a major undertaking given the widespread lack of autopsies undertaken in Africa and Asia [29]. Against this background of high discrepancy rates between post-mortem findings and other methods of assigning cause of death, the relatively good agreement between PCVA and InterVA-4 findings here is encouraging, even though both might differ from postmortem findings if those were available.
Attempts have been made to validate VA approaches in specific studies with hospital or laboratory data [30]. Some specific causes of death are amenable to this approach, for example by using particular data sets where ante-mortem HIV or sickle-cell status is documented [31,32]. A study from the Population Health Metrics Research Consortium recruited tertiary facility deaths across a range of hospitalassigned pre-determined causes, which were followed up with VA interviews [33]. This data set was used to build new models for assigning cause of death, which were then tested together with other models and physician assigned causes in the same data set. Unsurprisingly, models built within this data set performed better in relation to the hospital causes than either other models or physicians [34]. Further bench-testing of VA interpretation models showed roughly equivalent performance across various models when compared to PCVA as the reference standard [35]. By defining performance in relation to PCVA, however, these evaluations precluded comparison of public health consistency between models and physicians.
Analytical methods for comparing cause of death assignment are not entirely straightforward, because of the general uncertainty associated with cause of death, the interplay between precipitating and underlying causes, and the nature of the data. Here we have concentrated on comparing CSMFs, since that is the primary outcome of interest from cause of death data in public health. The concordance correlation coefficients and rank equivalence tests used here present accessible and convenient summary measures of how CSMFs from two different sources compared. For in-  dividual cause comparisons by factors such as source, agegroup and sex, the ratio between CSMFs by the two methods provides insight on specific aspects for comparison, and the confidence interval of that ratio is informative in deciding whether or not differences are due to chance. It has been suggested that comparisons between cause of death methods should be corrected for chance agreement, which is more likely to occur in common causes [36]. However, from a public health perspective this is not necessarily appropriate, since in practice agreement is generally accepted irrespective of the possibility that it was derived by chance.
The overall size and geographic diversity of the data presented here are important attributes. These VA data were not collected under carefully controlled and standardised procedures in order to minimise real-life sources of variation; this is a major strength of this study. The sources deliberately included a mix of high and low HIV and malaria settings, which are the two causes of highest variation in CSMF findings between specific settings. In any cause of death data, a relatively small number of more common causes account for the majority of the deaths, followed by many causes accounting for small fractions in the remainder. Consequently it is only possible to evaluate cause of death methods thoroughly in data sets which are large enough to include realistic numbers of rarer causes. Globally, most unrecorded deaths occur in Africa and Asia, which are therefore the regions where VA methods are most urgently needed, and which are represented in these data. It must also be noted that inevitably none of these archived data sets were originally collected under the WHO 2012 VA standard, and hence some degree of inter-site variation may have been introduced in the process of extracting the necessary VA indicator data.
One commonly contentious area in terms of cause of death is the interaction between HIV/AIDS and pulmonary TB. Three of the six data sources included substantial numbers of HIV/AIDS deaths during the periods covered by these data, and both InterVA-4 and PCVA findings reflected that. A validation study for InterVA-4 in relation to HIV serostatus showed high specificity for HIV/AIDS as a cause of death (ie, relatively few false-positive HIV/AIDS cause assignments) but also showed considerably elevated mortality rates among sero-positives for causes such as pneumonia and pulmonary tuberculosis [31]. Although ICD-10 coding in principle requires the use of codes B20-B24 where HIV and co-infections are involved, the extent to which this can reliably be implemented using VA methods is debatable, particularly if VA respondents are unaware of the HIV status of the deceased. In these analyses, there are clear differences between the two South African sources in this respect, with appreciably different proportions of deaths assigned as HIV/AIDS or tuberculosis. Conversely, in low HIV/AIDS or malaria settings, physicians may be reluctant to assign deaths to those causes. For example in the Verbal autopsy cause of death findings in Africa and Asia as determined by physician coding and probabilistic modelling Afghan data set, where very few HIV/AIDS deaths might be expected, HIV/AIDS was explicitly mentioned in four VA interviews, but this was not reflected in the PCVA results, which never assigned HIV/AIDS as a cause of death.
Any cause of death assignment process, at the individual level, will involve some degree of uncertainty. Formal procedures for assigning cause of death, for example in official death certificates, do not generally capture this uncertainty, but require the certifier to make a clear choice between possible causes [8]. Even if two certifiers are required to assess a case independently, as is often practised in PCVA, agreement does not necessarily constitute truth. One factor that emerges clearly from these analyses is that in the PCVA findings there is a greater tendency for physicians to choose chapter residual categories (pink markers in Figure 1), rather than specific causes (blue markers in Figure 1). This is evident from most of the pink markers lying below the line of equivalence, and is probably an expression of PCVA uncertainty. This was particularly evident in the neonatal age group, in addition to cross-over between neonatal sepsis and pneumonia categories, as seen in Online Supplementary Document,

CONCLUSIONS
Given the inherent difficulties and uncertainties involved in assigning cause of death, and the urgent need to implement large-scale, cost-effective CRVS procedures that include cause of death, it is clear that the priority for the foreseeable future in many low-and middle-income countries will be to undertake VA with automated cause of death assignment. We have shown here, using a large and diverse data set, that there is a strong correlation between in-country PCVA findings and outputs from the freely available In-terVA-4 model, over a wide range of settings. Whilst accepting that neither PCVA nor InterVA-4 results necessarily represent absolute truth, and that there is a continuing search for improved methods for assigning causes of death, the use of InterVA-4 represents a low-resource and highly consistent strategy, which is a major advance on knowing almost nothing about cause of death profiles in many populations. The diversity of cause of death profiles which InterVA-4 produces across the various sources clearly demonstrates that a standard model can be used successfully over a wide range of settings. InterVA-4, and the WHO 2012 VA standard with which it is fully compatible, should therefore be used as the currently available tools of choice for filling gaps in cause-specific CRVS data.